pacman::p_load(sf, tmap, tidyverse)Hands On Exercise 2: Thematic Mapping and GeoVisualisation with R
Overview
Let’s start with learning what a choropleth map is. A choropleth map is a thematic map that uses shades or patterns to visualize measurement of variables. For example, a choropleth map can be used to visualize population density or crime rate in a certain area.
In this exercise, we are looking into plotting maps choropleth maps using the tmap package in R.
Package Setup
This exercise will primarily use the tmap package to create choropleths. Additionally, the sf package will be used to handle geospatial data, and tidyverse (specifically readr, tidyr, and dplyr) will also be used for processing data.
This code chunk will install and load the packages
Data Sets
Two datasets will be used to create the choropleth map
Master Plan 2014 Subzone Boundary (Web) (i.e. MP14_SUBZONE_WEB_PL) in ESRI shapefile format. It can be downloaded at data.gov.sg This is a geospatial data. It consists of the geographical boundary of Singapore at the planning subzone level. The data is based on URA Master Plan 2014.
Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2011-2020 in csv format (i.e. respopagesextod2011to2020.csv). This is an aspatial data fie. It can be downloaded at Department of Statistics, Singapore Although it does not contain any coordinates values, but it’s PA and SZ fields can be used as unique identifiers to geocode to MP14_SUBZONE_WEB_PL shapefile.
Data Import
This code chunk will use st_read() function to import MP14_SUBZONE_WEB_PL into R
mpsz <- st_read(dsn = "./data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\Users\yozaf\SMUY3S2\Geospatial\IS415-GAA\Hands-on_Ex\Hands-on_Ex02\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
Use this code chunk to examine the content of mpsz
mpszSimple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 10 features:
OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
1 1 1 MARINA SOUTH MSSZ01 Y MARINA SOUTH
2 2 1 PEARL'S HILL OTSZ01 Y OUTRAM
3 3 3 BOAT QUAY SRSZ03 Y SINGAPORE RIVER
4 4 8 HENDERSON HILL BMSZ08 N BUKIT MERAH
5 5 3 REDHILL BMSZ03 N BUKIT MERAH
6 6 7 ALEXANDRA HILL BMSZ07 N BUKIT MERAH
7 7 9 BUKIT HO SWEE BMSZ09 N BUKIT MERAH
8 8 2 CLARKE QUAY SRSZ02 Y SINGAPORE RIVER
9 9 13 PASIR PANJANG 1 QTSZ13 N QUEENSTOWN
10 10 7 QUEENSWAY QTSZ07 N QUEENSTOWN
PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
1 MS CENTRAL REGION CR 5ED7EB253F99252E 2014-12-05 31595.84
2 OT CENTRAL REGION CR 8C7149B9EB32EEFC 2014-12-05 28679.06
3 SR CENTRAL REGION CR C35FEFF02B13E0E5 2014-12-05 29654.96
4 BM CENTRAL REGION CR 3775D82C5DDBEFBD 2014-12-05 26782.83
5 BM CENTRAL REGION CR 85D9ABEF0A40678F 2014-12-05 26201.96
6 BM CENTRAL REGION CR 9D286521EF5E3B59 2014-12-05 25358.82
7 BM CENTRAL REGION CR 7839A8577144EFE2 2014-12-05 27680.06
8 SR CENTRAL REGION CR 48661DC0FBA09F7A 2014-12-05 29253.21
9 QT CENTRAL REGION CR 1F721290C421BFAB 2014-12-05 22077.34
10 QT CENTRAL REGION CR 3580D2AFFBEE914C 2014-12-05 24168.31
Y_ADDR SHAPE_Leng SHAPE_Area geometry
1 29220.19 5267.381 1630379.3 MULTIPOLYGON (((31495.56 30...
2 29782.05 3506.107 559816.2 MULTIPOLYGON (((29092.28 30...
3 29974.66 1740.926 160807.5 MULTIPOLYGON (((29932.33 29...
4 29933.77 3313.625 595428.9 MULTIPOLYGON (((27131.28 30...
5 30005.70 2825.594 387429.4 MULTIPOLYGON (((26451.03 30...
6 29991.38 4428.913 1030378.8 MULTIPOLYGON (((25899.7 297...
7 30230.86 3275.312 551732.0 MULTIPOLYGON (((27746.95 30...
8 30222.86 2208.619 290184.7 MULTIPOLYGON (((29351.26 29...
9 29893.78 6571.323 1084792.3 MULTIPOLYGON (((20996.49 30...
10 30104.18 3454.239 631644.3 MULTIPOLYGON (((24472.11 29...
Next, we will import respopagsex2011to2020.csv file into RStudio and save the file into an R dataframe called popdata.
We will use the read_csv() function to perform the task
popdata <- read_csv("./data/aspatial/respopagesextod2011to2020.csv")Data Preparation
Before preparing a thematic map, we need to prepare a data table with values from year 2020. Here is the required variables:
- YOUNG: age group 0 to 4 until age groyup 20 to 24,
- ECONOMY ACTIVE: age group 25-29 until age group 60-64,
- AGED: age group 65 and above,
- TOTAL: all age group, and
- DEPENDENCY: the ratio between young and aged against economy active group
We can use this code chunk to get the values
popdata2020 <- popdata %>%
filter(Time == 2020) %>%
group_by(PA, SZ, AG) %>%
summarise(`POP` = sum(`Pop`)) %>%
ungroup()%>%
pivot_wider(names_from=AG,
values_from=POP) %>%
mutate(YOUNG = rowSums(.[3:6])
+rowSums(.[12])) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[7:11])+
rowSums(.[13:15]))%>%
mutate(`AGED`=rowSums(.[16:21])) %>%
mutate(`TOTAL`=rowSums(.[3:21])) %>%
mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)
/`ECONOMY ACTIVE`) %>%
select(`PA`, `SZ`, `YOUNG`,
`ECONOMY ACTIVE`, `AGED`,
`TOTAL`, `DEPENDENCY`)Then, we need to join the attribute data with the geospatial data. But before we can do that, we need to convert to values in PA and SZ fields to uppercase so that it matches the SUBZONE_N and PLN_AREA_N
popdata2020 <- popdata2020 %>%
mutate_at(.vars = vars(PA, SZ),
.funs = list(toupper)) %>%
filter(`ECONOMY ACTIVE` > 0)After we run the code chunk above, we can use the left_join function to join the tables together, where SUBZONE_N and SZ are the common identifier
mpsz_pop2020 <- left_join(mpsz, popdata2020,
by = c("SUBZONE_N" = "SZ"))Then, save the mpsz_pop2020 to a .rds file
write_rds(mpsz_pop2020, "./data/mpszpop2020.rds")Choropleth Mapping Using tmap
As mentioned before, Choropleth mapping uses shades of colors or patterns in visualizing geospatial data.
Two approaches in preparing thematic map in tmap: 1. Using qtm(): quick and simple 2. Using tmap elements: customizable
Using qtm()
For simple thematic maps, qtm() provides a concise solution. the fill argument is where we put which attribute we want to map
tmap_mode("plot")
qtm(mpsz_pop2020,
fill = "DEPENDENCY")
Using tmap_mode(‘plot’), we can generate a static map, while tmap_mode(‘view’) gives us an interactive mode
tmap_mode("view")
tmap_options(check.and.fix = TRUE)
qtm(mpsz_pop2020,
fill = "AGED")Using tmap elements
While qtm() is useful and concise, sometimes we might need something more customizable, and that is where tmap elements can come into play. Here is an example of a thematic map generated with tmap elements
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
title = "Dependency ratio") +
tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2) +
tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS",
position = c("left", "bottom"))Now we will break down the functions used to plot these elements
Base map
tm_shape() is the basic building block, followed by one more layer elements such as tm_fill() and tm_polygons()
tm_shape(mpsz_pop2020) + tm_polygons()Drawing the map with tm_polygons
We can use tm_polygons(variable), where variable refers to the target variable we want to assign
tm_shape(mpsz_pop2020)+
tm_polygons("DEPENDENCY")####Drawing the map with tm_fill() and tm_border() tm_polygons() is a wrapper of tm_fill() and tm_border(), where tm_fill shades the polygons, while tm_border() gives the borders.
#tm_fill alone
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY")#tm_border alone
tm_shape(mpsz_pop2020)+
tm_borders(lwd=0.1, alpha=1)#Combined
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY") +
tm_borders(lwd = 0.1, alpha = 1)Data classification in tmap
Most choropleth maps uses some type of data classification, to group large number of observations to certain ranges or classes.
tmap provides a total ten data classification methods, namely: fixed, sd, equal, pretty (default), quantile, kmeans, hclust, bclust, fisher, and jenks. To define which method to use, the style argument from the tm_polygons() or tm_fill() layer is used.
Built-in methods
#Jenks
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 5,
style = "jenks") +
tm_borders(alpha = 0.5)#Equal
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 5,
style = "equal") +
tm_borders(alpha = 0.5)tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 3,
style = "sd") +
tm_borders(alpha = 0.5)tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 6,
style = "kmeans") +
tm_borders(alpha = 0.5)Different methods have different ways of separating the classes, so choose a method that suits our needs. We also need to find the right number of classes. Too many classes might look too cluttered, while too little classes might not give us enough information to draw any meaningful insights
Custome break
When we need to set the breakpoints by ourselves (instead of using the built-in functions), we can set it explicitly in the breaks argument of tm_fill() (note that to get n classes, input n+1 elements in the breaks argument, in increasing order)
#Good practice to perform EDA before specifying breakpoints
summary(mpsz_pop2020$DEPENDENCY) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.1111 0.7147 0.7866 0.8585 0.8763 19.0000 92
From the summary above, it gives us a good gauge on how to set our breakpoints.
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
breaks = c(0, 0.60, 0.70, 0.80, 0.90, 1.00)) +
tm_borders(alpha = 0.5)Setting Colour Scheme
We can define our won colour ramps, or use RColorBrewer package
ColourBrewer
We can assign our colour ramps in the palette argument of tm_fill()
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 6,
style = "quantile",
palette = "Blues") +
tm_borders(alpha = 0.5)We can add “-” before the colour to reverse it
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 6,
style = "quantile",
palette = "-Blues") +
tm_borders(alpha = 0.5)Layouts Map
layout is the combination of all map elements
Map Legend
tmap provides several legend options
tmap_mode('plot')
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "jenks",
palette = "Blues",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone \n(Jenks classification)",
main.title.position = "center",
main.title.size = 1,
legend.height = 0.45,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right", "bottom"),
frame = FALSE) +
tm_borders(alpha = 0.5)
#Map Style
tmap also have preset styles, called using tmap_style()
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "-Greens") +
tm_borders(alpha = 0.5) +
tmap_style("classic")
Cartographic Furniture
tmap also has other map furnitures that can be useful, such as compass, scale bar, and grid lines
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
title = "No. of persons") +
tm_layout(main.title = "Distribution of Dependency Ratio \nby planning subzone",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar(width = 0.15) +
tm_grid(lwd = 0.1, alpha = 0.2) +
tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS",
position = c("left", "bottom"))
We can reset to default style with this chunk below
tmap_style("white")Small Multiple Choropleth Maps
Small multiple maps, or facet maps, are just as the name suggests: multiple small maps arranged side-by-side or stacked vertically. This is useful in visualizing how the relationship changes with respect to another variable.
In tmap, there are 3 ways to plot facet maps - by assigning multiple values to at least one - of the asthetic arguments, - by defining a group-by variable in tm_facets(), and - by creating multiple stand-alone maps with tmap_arrange().
Assigining multiple values to at least one of the aesthetic arguments
tm_shape(mpsz_pop2020)+
tm_fill(c("YOUNG", "AGED"),
style = "equal",
palette = "Blues") +
tm_layout(legend.position = c("right", "bottom")) +
tm_borders(alpha = 0.5) +
tmap_style("white")
tm_shape(mpsz_pop2020)+
tm_polygons(c("DEPENDENCY","AGED"),
style = c("equal", "quantile"),
palette = list("Blues","Greens")) +
tm_layout(legend.position = c("right", "bottom"))
Using tm_facets()
tm_shape(mpsz_pop2020) +
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
thres.poly = 0) +
tm_facets(by="REGION_N",
free.coords=TRUE,
drop.shapes=TRUE) +
tm_layout(legend.show = FALSE,
title.position = c("center", "center"),
title.size = 20) +
tm_borders(alpha = 0.5)
Using tmap_arrange() to create multiple stand-alone maps
youngmap <- tm_shape(mpsz_pop2020)+
tm_polygons("YOUNG",
style = "quantile",
palette = "Blues")
agedmap <- tm_shape(mpsz_pop2020)+
tm_polygons("AGED",
style = "quantile",
palette = "Blues")
tmap_arrange(youngmap, agedmap, asp=1, ncol=2)
Mapping Spatial Object That Meets a Criteria
We can also use a selection function to map only objects that meet s the selection criterion
tm_shape(mpsz_pop2020[mpsz_pop2020$REGION_N=="CENTRAL REGION", ])+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(legend.outside = TRUE,
legend.height = 0.45,
legend.width = 5.0,
legend.position = c("right", "bottom"),
frame = FALSE) +
tm_borders(alpha = 0.5)